Add MT-Bench and PR-Bench Support #9

danmcp · 2024-06-17T17:57:24Z

No description provided.

Signed-off-by: Ali Maredia <[email protected]>

nathan-weinberg

Initial round of comments, will look more in-depth as well

requirements.txt

src/instructlab/eval/answers.py

src/instructlab/eval/common.py

src/instructlab/eval/data/mt_bench/reference_answer/gpt-4.jsonl

nathan-weinberg · 2024-06-24T15:47:39Z

src/instructlab/eval/pr_bench_generator.py

+    with open(fn, "r", encoding="utf-8") as file:
+        contents = yaml.safe_load(file)
+    return contents.get("seed_examples")


Are we checking the YAML/schema validity at all?

Don't think we need for this PR specifically, but @bjhargrave is working on a Dev Doc to get this functionality into the instructlab-schema package: instructlab/dev-docs#101

alimaredia · 2024-06-24T21:04:06Z

src/instructlab/eval/mt_bench_common.py

+    for _ in range(API_MAX_RETRY):
+        try:
+            messages = conv.to_openai_api_messages()
+            if messages[0]["role"] == "system" and messages[1]["role"] == "user":


@xukai92 do we need to change this to what's in here? xukai92/FastChat@5d44295. You have an issue for it here: #11

I went ahead and included. Where are we expecting the env var to be set though?

README.md

tests/test_branch_gen_answers.py

alimaredia · 2024-06-25T14:53:21Z

The comments I had in my review are not meant to block the merging of this PR. They are to point out or ask questions about follow up work.

Signed-off-by: Dan McPherson <[email protected]>

JamesKunstle

This PR seems like it's in good enough shape to start iterating on.

alimaredia added 5 commits June 17, 2024 13:33

Initial commit trying to merge Fastchat mtbench into eval repo

fefdb92

Signed-off-by: Ali Maredia <[email protected]>

merge gen_api_answer.py into codebase

3e5cac4

Signed-off-by: Ali Maredia <[email protected]>

show what filesystem looks like after gen_judgement.py is run

810dcfc

Signed-off-by: Ali Maredia <[email protected]>

ran through show_result step

cf2ea02

Signed-off-by: Ali Maredia <[email protected]>

start to incorporate gen_judgement.py into library

ad020ef

Signed-off-by: Ali Maredia <[email protected]>

danmcp force-pushed the main branch 23 times, most recently from 7b11486 to a9c16f2 Compare June 19, 2024 20:24

danmcp changed the title ~~WIP: Add MT-Bench Support~~ Add MT-Bench Support Jun 19, 2024

danmcp force-pushed the main branch from a9c16f2 to de4f44d Compare June 19, 2024 21:45

danmcp force-pushed the main branch from 9882a55 to b8b62c3 Compare June 24, 2024 15:40

nathan-weinberg reviewed Jun 24, 2024

View reviewed changes

nathan-weinberg mentioned this pull request Jun 24, 2024

Add test runner to CI #12

Closed

danmcp force-pushed the main branch 13 times, most recently from 55e6346 to 1594a73 Compare June 24, 2024 23:43

alimaredia approved these changes Jun 25, 2024

View reviewed changes

danmcp force-pushed the main branch 7 times, most recently from d9289cf to 3e21919 Compare June 25, 2024 15:42

Working toward a functional state

68185e4

Signed-off-by: Dan McPherson <[email protected]>

danmcp force-pushed the main branch from 3e21919 to 68185e4 Compare June 25, 2024 16:20

JamesKunstle approved these changes Jun 25, 2024

View reviewed changes

nathan-weinberg approved these changes Jun 25, 2024

View reviewed changes

nathan-weinberg merged commit ffe1aa1 into instructlab:main Jun 25, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MT-Bench and PR-Bench Support #9

Add MT-Bench and PR-Bench Support #9

danmcp commented Jun 17, 2024

nathan-weinberg left a comment

nathan-weinberg Jun 24, 2024

alimaredia Jun 24, 2024

danmcp Jun 25, 2024 •

edited

Loading

alimaredia commented Jun 25, 2024

JamesKunstle left a comment

Add MT-Bench and PR-Bench Support #9

Add MT-Bench and PR-Bench Support #9

Conversation

danmcp commented Jun 17, 2024

nathan-weinberg left a comment

Choose a reason for hiding this comment

nathan-weinberg Jun 24, 2024

Choose a reason for hiding this comment

alimaredia Jun 24, 2024

Choose a reason for hiding this comment

danmcp Jun 25, 2024 • edited Loading

Choose a reason for hiding this comment

alimaredia commented Jun 25, 2024

JamesKunstle left a comment

Choose a reason for hiding this comment

danmcp Jun 25, 2024 •

edited

Loading